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SUMMARY 



Smithtown is an intelligent tutoring system designed to enhance an individual's scientific 
inquiry skills, as well as to provide an environment for loarning principles of basic microeconomics. 
It was hypothesized that computer instruction on applying effective interrogative skills (e.g.. 
changing one variable at a time while holding all else constant) would ultimately lead to the 
acquisition of the specific subject matter This paper presents an evaluation of Smithtown in 
two studies of Individual differences In learning. Experiment 1, an exploratory study demonstrated 
that Smithtown fared very well when compared to traditional instruction on economics and 
delineated the performance Indicators which separated better from worse learners in this discovery 
environment. Experiment 2 extended the findings from the exploratory study using a large 
sample of subjects {N = 530) from a different population. Results showed that the performance 
indicators relating to hypothesis generation and testing were the most predictive of successful 
learning in Smithtown, accounting for considerably more of the variance in the learning criterion 
than a measure of general intelligence. Overall, the system performed as expected. Tutoring 
on scientific inquiry skills resulted in increased knowledge of microeconomics. The differentiating 
behaviors between more and less successful learners were in agreement with specific behaviors 
relating to individual differences found in general studies on problem solving and concept 
formation. From an instructional perspective, the behaviors denoted can ser^e as a focal point 
for relevant intervention studies. From a design perspective, findings from these studies suggest 
modifications to intelligent tutoring systems so they may be more like the individualized teaching 
systems they have the potential to be. 
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INDIVIDUAL DIFFERENCES IN LEARNING FROM AN INTELLIGENT 
DISCOVERY WORLD: SMITHTOWN 



I. INTRODUCTION 

Smithtown Is an Intelligem tutoring syslom designed as a guided discovery world whose 
primary goal is to assist Individuals in becoming more systematic and scientific In their discovery 
Of laws for a given domain. A second goal of the system is to impart specific content knowledge 
in microeconomics, specifically the laws of supply and demand. 

This paper presents a large-scale evaluation of Smithtown with regard to these two goals 
The first study compared declarative knowledge acquisition between subjects interacting with 
this system, students enrolled In an introductory economics course, and a control group using 
both quantitative and qualitative measures. Focusing on the group Interacting with Smithtown 
differences were analyzed In the behaviors (or performance Indicators) between those individuals 
that were successful In this type of discovery environment versus those less successful 
Success" was defined as a large gain score in performance from a pretest battery of economic 
concepts to posttest battery scores. The second study analyzed data from a large group of 
subjects (N = 530) Interacting with the system to see which performance Indicators were 
correlated with a dependent measure of learning as well as a general Intelligence measure Of 
interest was whether these data replicated the findings from the first study that employed a 
srnaller sample from a different population, as well as Investigating the nature and range of the 
relationship between general intelligence and learning. 

Scientific inquiry can be seen as a problem solving activity involving both top-down and 
bottom-up processing of information (Greeno & Simon, 1988). Of particular Interest to this 
research Is the training of scientific inquiry skills which Include: (a) generating and testing 
hypotheses: (b) observing, recording, and organizing data resulting from experimental tests- (c) 
modifying hypotheses in accordance with the results; and (d) Inducing regularities and laws 
(Shute, Glaser, & Raghavan, 1989). 

Generating and testing hypotheses using observations and empirical findings is important to 
scientific work, as well as to the acquisition of knowledge in general. When hypotheses are 
generated and new information is obtained, they serve as a basis for confirming or refutina 
perceived regularities and lawful relationships. There are two problem,- associated with Induction 
and hypothesis testing. First, many learners can Induce regularities or patterns, but do not 
treat them as hypotheses to be tested. Second, even when subjects realize that they should 
test a hypothesis, they may use faulty methods or procedures that do not guarantee that the 
Inferences drawn on the data are reasonable or relevant to the world or system being observed. 

Previous studies of induction have mainly focused on inducing a rule or classifying relatively 
abstract stimuli Into categories on the basis of feedback about classification errors and other 
information (see Pellegrino & Glaser, 1980; Smith & Medin, 1981). This large literature can be 
seen as relating mostly to passive induction where learners Induce rules, make hypotheses and 
classify and taxonomize observations on the basis of experimenter-controlled presentation of 
predeterm ned Instances. However, a more active process Is apparent when the learner can 
select variables, design Instances, and Interrogate his or her existing knowledge and memory 
for recent events. To study the latter form of Induction, this paper discusses the application 
of a research paradigm that allows examination of active experimentation In which learners 
explore and generate new data and test hypotheses with the data they have accumulated in 
the course of their Investigations. Recent experimental technology and computer modeling have 
?)foo® l?,^'yp?,°^ experimentation feasible (Bonar, Cunningham, & Schuitz. 1986; Klahr & Dunbar 
1988; MIchalski, 1986; Yazdani, 1986). . , « uunudr. 
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Facilitation of scientific inquiry skills has been investigated by White and Horowitz (1987) 
via their 'Thinker Tools" environment. Their approach was to first motivate students to want to 
learn by pointing out errors and inconsistencies in their current beliefs. Second, the students 
were guided through a series of microworlds, each one more complex than the preceding one 
with the objective of evolving more precise mental models of the subject matter (I.e., Newtonian 
mechanics). Third, the students had to formalize their developing mental models by evaluating 
a set of laws describing phenomena In the microworld. Finally, the students had to apply the 
selected law to see how it predicted real world phenomena. 

A difference between White and Horowitz's approach and the approach outlined in this paper 
Is the degree of student control in the learning process. In particular, it is believed that a 
more active process can be more facilitating to knowledge and skill acquisition, especially in 
conjunction with tutorial assistance on strategies related to testing generalizations. Furthermore, 
It Is postulated that discovery learning can contribute to a rich understanding of domain 
information by enabling students to access and organize information themselves. Thus, applying 
interrogative skills is the "active process" that leads to learning in discovery situations. A 
proposition to be evaluated In this research is that effective Interrogative skills are teachable 
or trainable If they can be articulated and practiced under circumstances which require their 
use. 

Another hypothesis to be tested by this research Is that Intelligent tutorial guidance on 
effective inquiry skills, combined with a discovery world environment, can transform haphazard 
problem solving procedures Into efficient, methodical learning procedures. Such a transformation 
arises from an individual's own actions and hypotheses. ThU5>, a second proposition to be 
evaluated in this research Is that focusing on the tutoring of specific inquiry skills should 
consequently lead to learning the subject matter. 

The remainder of this paper will be organized as follows. First, an overview of the system 
is presented and the two knowledge bases In Smlthtowm Inductive inquiry skills and economic 
knowledge. Second, maneuvering within the environment is illustrated. Third, a section describing 
an exploratory investigation (Experiment 1) Into individual differences in learning within this 
environment. Fourth, results are presented from comparing the learning outcomes of subjects 
using Smithtown with another Instructional treatment and no treatment. This section also includes 
the results from an analysis of effective performance characteristics of the subjects In the 
experimental group. Fifth, a large scale confirmatory analysis of data (Experiment 2) Is discussed, 
obtained from subjects using Smithtown. Finally, a general discussion of the educational and 
scientific Importance of these studies Is provided. 



II. SMITHTOWN 

The main goal of Smithtown is to enhance students' general problem solving and inductive 
learning skills. It does this in the context of microeconomics, providing an environment that 
fosters learning the laws of supply and demand. Smithtown is a highly interactive program, 
allowing students to pose questions and conduct experiments within the computer environment, 
testing and enriching their knowledge bases of functional relationships by manipulating various 
economic factors. 

Since Smithtown was designed to be a guided discovery environment, there Is no fixed 
curriculum. Rather, the student generates his or her own hypotheses and problems, not the 
system. After generating a hypothesis (e.g., "Does Increasing the price of coffee affect the 
demand for Cremora?"), the student tests It by executing a series of actions, such as collecting 
baseline data on Cremora (e.g., the equilibrium price, the quantity demanded at that price), 
entering the coffee market and increasing the price of coffee, then returning to the Cremora 
market and observing the ensuing changes to relevant variables. To make this affect more 




salient, data may be plotted, such as superimposing two demand curves for Cremora, both 
before and after the price change was made to coffee. This series of actions for creating and 
executing a given "experiment" defines a student solution. 

Smithtown has the instructional goal of teaching general problem solving skills. Instead 
of a curriculum-based Instructional sequence, Smithtown relies on a process of constantly 
monitoring student actions, looking for evidence of good and poor behaviors, and then coaching 
students to become more effective problem solvers. Coaching transpires only if a subject 
demonstrates three buggy behaviors or errors of omission in the environment, thus being an 
unobtrusive coach. 

The system keeps a detailed history list of all student actions, grouping them into (i.e., 
interpreting them as) behaviors and solutions. Smithtown diagnoses solution quality in two 
ways. It looks for overt errors by comparing student solutions with its "buggy critics" which 
are sets of actions or non-actions that constitute suboptimal behaviors. It also compares student 
solutions with its own "good critics" or expert solutions. Discrepancies between the two are 
collected Into a list of potential problem areas and passed on to the coach for possible 
remediation. 

Another area of "Intelligence" in Smithtown resides in its knowing about economic relationships 
among different variables* After a student conducts an experiment or series of experiments, 
collects data testing a hypothesis, and evaluates the results, he or she is in a position to make 
a generalization about an economic phenomenon. The student may state this generalization 
in the hypothesis window. The system compares the learner's input with known relationships, 
and If the student states a valid principle or law governing economic variables {e.g». As price 
increases, quantity demanded decreases), he or she is informed that their articulated hypothesis 
was correct (e.g., "Congratulations! You have just discovered what economists refer to as the 
law of demand"). Incorrect, or not understood by the system. 

Figure 1 shows the flow of information and control in Smithtown, that is, the possible 
interactions one may have with the system as well as the modules of the program. The student 
takes some action In the environment. Particular sequences of actions constitute different 
Inquiry behaviors that the system matches against known behaviors, both good and bad. The 
system's coach provides feedback to the student based on the type of error shown (i.e., an 
overt error: demonstrating a buggy behavior, or an error of omission: not doing something 
that was appropriate at the time), mediated by some pedagogical heuristics (e.g.. first address 
buggy behaviors before addressing errors of omission, or If several buggy behaviors are 
confirmed, address the more critical one with the higher weight first). If the student action is 
to specify a hypothesis, the system pattern matches the statement against known economic 
relationships and provides feedback appropriately. 

Most intelligent instructional systems require some kind of knowledge base (declarative or 
procedural) to be learned. Smithtown's two system knowledge bases will now be detailed: 
one for inductive Inquiry skills (procedural knowledge, or knowing how to do X) and the other 
for economic concepts (declarative knowledge, or knowing about X). 

Inductive Inquiry Skills. Scientific Inquiry behaviors were delineated and categorized by an 
earlier study conducted with Smithtown yielding information about effective and ineffective 
behaviors for Interrogating a new domain (see Shute & Glaser, In press). Some examples of 
"good" Inquiry behaviors would be changing one variable at a time while holding everything 
else constant and conscientiously recording relevant data In the online notebook. These 
behaviors were coded into rules and the system monitors a learner's actual behaviors with 
respect to these rules. Thus the system recognizes sequences of good behaviors and also 
sequences of ineffective or buggy behaviors. 
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Figure 1 . Flow Chart of Information and Control in Smithtown. 

If a student Is performing satisfactorily In the environment (I.e.. not repeatedly manifesting 
buggy behaviors and learning economic concepts at a reasonable rate), he or she will not be 
interrupted except to receive occasional congratulatory feedback when relevant. However, if 
the system determines that a student Is floundering or demonstrating buggy behaviors, the 
coach will Intervene and offer assistance on the specific problematic behavior(s). For Instance, 
if a student changes many variables at one time without first looking at the baseline data, the 
following rule would be invoked (paraphrased): 

The student changes more than two variables at a time prior to 
collecting baseline data for a given market, and It Is early in the 
session where the experiment number is less than four, 

Then Increment the "Multiple Variable Changes" bug count by 1 and pass 

the list to the coach for possible assistance. 

If this rule count surpasses a threshold value (e.g., three times), then the coach would 
appear on the screen, informing the student, '7 see rrtaf yoo're changing several variables at 
the same time. A better strategy would be to enter a market, see vi/hat the data look like 
before any variables have been changed, then Just change one variable while holding all the 
others constant,'' 
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I also created a list of specific performance measures to determine the type of actions 
Yielding differential performance In this environment. These performance measures or learning 
indicators" were extracted from the student history list and arrayed by complexity, rom low-level, 
simple counts of actions (e.g., total number of notebook entries made) to higher-level, complex 
behaviors (e.g., number of times a generalization of a concept was made across related goods) 
These indicators appear in Appendix A and serve as one data source in this study on individual 
differences in learning In Smithtown. 

Economic Concepts. The second knowledge base concerns functional relationships among 
economic variables relating to supply and demand in a competitive market. The concepts were 
selected following discussions with an economics professor about relevant concepts for an 
introductory microeconomics course. Definitions can be seen below: 

DEMAND- The buyer's side of the market Is called demand. The law of demand says that 
the quantity of a product which consumers would be willing and able to buy during some 
period of time is Inversely related to the price of the product. Graphing this relationship results 
in a demand curve showing how the quantity demanded of a good or service will change as 
the price of that good or service changes, holding all other factors constant. 

SUPPLY- The seller's side of the market Is called supply. The law of supply is that the 
quantity of a product which producers would be willing and able to produce and sell is related 
to the price of the product in a positive function. Graphing price and quantity supplied results 
in a supply curve. 

EQUILIBRIUM: There are many factors that Influence the price of a given product, but 
when a price is reached where the quantity supplied is equal to the quantity demanded that 
market Is at a point of equilibrium. Competitive markets always tend toward points of equilibrium. 

SURPLUS- if the market price is higher than the equilibrium price, buyers will demand 
smaller quantities than sellers are supplying. This will create a surplus. Surpluses of unsold 
goods will tend to lower the price down toward the equilibrium level. 

SHORTAGE- if the market price is lower than the equilibrium price, buyers will demand 
larger quantities than sellers are supplying, thus creating a shortage. Shortages will lead to 
price Increases, and the price will rise toward the equilibrium level. 

CHANGE IN DEMAND: A change to certain variables other than price will cause the entire 
demand curve to shift, depending on which variable is changed and the magnitude of the 
adjustment Some of the variables in Smithtown that can be manipulated and that shift the 
demand curve are: per capita income, population, interest rates, weather, consumer preferences, 
and the price of substitute and complementary goods. 

CHANGE IN SUPPLY: Again, changing certain variables other than price will cause the 
entire supply curve to shift, depending on the variable and the amount of change, in Smithtown 
the variables or "town factors" that can be manipulated to effect a supply curve shift inc ude: 
labor costs, number of suppliers, as well as some of the variables mentioned above (e.g., 
weather). 

NEW EQUILIBRIUM POINT: Equilibrium, once established, can be disturbed by changes in 
demand and/or supply, if demand and/or supply change, a surplus or shortage will result at 
the original price, and the price will move toward a new equilibrium. A shortage at the original 
price will cause the old price to rise to the new level and cause changes in the quantities 
supplied and demanded. A new equilibrium will be established at the second price and the 
second quantity. 
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ADDITIONAL CONCEPTS: Besides the above economic concepts, at least two more can 
be learned from the discovery world, although they are not explicitly recognized by the system: 
cross elasticity of demand and supply. Cross elasticity of demand Indicates how a change in 
one market affects the demand in a related market while cross elasticity of supply Indicates 
how a change In one market affects the supply In a related market. 

To learn the concepts embedded in Smithtown, students are free to manipulate variables, 
observe the effects, and apply the online tools to organize their information In an effective way 
Tools available for these activities include a notebook for collecting data from experiments 
(Figure 2). a table to organize data from the notebook (Figure 3), a graph utility to plot data 
(Figure 4), and a hypothesis menu to formulate relationships among variables (Figure 5). Three 
history windows allow the students to see a chronological listing of all actions, data, and 
concepts learned. 




mm 



Good/Sefvice - led 
txpcfmeni No. - 1 
1 nie nier vdl 6 
Ouanlity l>emarvk:d ■ AVAl 
Quanlily Si4)plred = '^AOO 
Shofl<*<)e - 1720 
Sun^ki3 - 0 

Price = $ 2.bO pet 1tX) bag iwn 



Cocnpulf mdmtX urtcm jl Continue 



AdimI price mys#H • Conlinuc 



Mirik€ A NoUbooh Entry 



S9t up Graph 



SUIe ■ Hypolhftsis 



Change Good, Same VfUibk 



Seme Good. Change Vf t«bte(sl 
Chenqe Good. C 



StMt Ai Over 




Figure 2 . Online Notebook and Other Screen Features of Smithtown. 
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Figure 3 . Table Package for Ordering Data. 



Students' experiments are independentiy created and executed, thus unique to each individual. 
The system recognizes two types of systematic investigations: (a) expiorations -observing and 
obtaining Information from Smithtown in order to generate hypotheses about microeconomic 
concepts and iaws, and (b) experiments -a series of student actions conducted to confirm or 
differentiate hypotheses (see Shrager, 1985, fo; a simiiar demarcation). Experiments are associated 
with a specific prediction from the "Prediction" menu whiie explorations are not. Moreover, the 
system does not provide coaching whiie a learner Is In "exploratory mode." Only whan a person 
is classified as conducting an experiment does he or she receive feedback regarding their 
actions. 

The procession of evonts a person goes through in creating an experiment are fixed. First 
a student selects a market to investigate from the "Goods Menu" (sc . Figure 6a). The selection 
of markets to include in the system was based on inherent and interesting relationships existing 
among different goods, such as complementary associations (e.g., ground beef and hamburger 
buns), substitute goods (e.g., coffee and tea) as well as more complex relationships (e.g.. large 
cars, compact cars and gasoline). Next, the student informs the system of his or her experimental 
Intentions by identifying variables of Interest from the "Planning Menu" (see Figure 6b). After 
choosing the focal variables for further experimentation, a student Is free to make changes to 
any of the town factors (see Figure 6c). For each new experiment, the system asks the student 
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If he or she would like to make a prediction regarding the planned experiment, if the student 
says "No," the next menu to appear Is the 'Things To Do Menu." If the student replies "Yos," 
a window appears where specific statements can be entered about predicted outcomes. For 
example, If a student wanted to investigate the relationship between income and the demand 
for large cars, and then proceeded to increase the per capita Income, a correct prediction 
would be, "Demand for large cars will increase." 
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Figure 4 . Graph Package for Plotting Data. 
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Figure 5. Hypothesis Menu with the Law of Demand Specified. 



Subsequent to setting up an experiment, the subject engages in activities from the "Things 
To Do" menu. Ail formai experiments are implemented from this main menu where 10 options 
are provided, outlined below. 

"i- See Market Sales information . This windov/ displays the current information on 
the state of the market (see Figure 2, "Market Window"). 

2. Computer Adjust Price . The computer will increase or decrease the price, whichever 
brings the current market closer to equilibrium. This occurs in successive approximations rather 
than changing the state immediately into equilibrium. 

3- Self Adiust Price . This option provides the student with an online calculator and 
allows the price of the particular good to be changed within a prescribed range of values, 
specific and realistic to each good. 

Make a Notebook Entry . The student selects variables to record and the current 
values are automatically put into the notebook. 

5, Set Up Table . The table package allows the student to select variables of Interest 
from the notebook, put them together in a table, and sort on any selected variable, by ascending 
or descending order. 
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6. Set Up Graph . The graph utility allows a student to plot data collected from his/her 
explorations and experiments. This provides an alternative way of viewing relations between 
variables. 



7. Make a Hypothesis . The hypothesis menu allows students to make Inductions or 
generalizations from relationships the data they have collected and organized. There are 
four connected menus of words and phrases comprising the hypothesis menu. First, the 
"connector menu" Includes the Items: if, then, as, when, and, and the. The "object menu" 
contains the economic variables used by the system. The "verb menu" describes the types of 
change, like decreases, Increases, shifts as a result of, and so on. Finally, the "direct object 
menu" allows for more precise specification of concepts such as: over time, along the demand 
curve, changes other than price, etc. 

8-10. Experimental Frameworks . Three "experimental frameworks" provide the student 
with easy maneuvering within and between experiments. These Include: Change Good, Same 
Varlable(s); Same Good, Change Varlable(s); and Change Good, Change Varlable(s). They are 
used to change to a new market while holding the independent variables the same, change 
town factor(s) while holding the market constant, or to change both the town factor(s) and the 
market, respectively. 
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Figure 6a. Goods and Figure 6b . Planning Menu Figure 6c . Town Factors 

Services Menu items (Local Variables for M«nu. 

(Smithtown Markets). Upcoming Experiment). 



Three history windows are also included In the system, accessible to both students and 
system. Histories are maintained In the Student History window of each action taken as students 
continue to perform different explorations and experiments. In addition, the Market History 
window keeps a record of ail variables and associated values from every experiment conducted. 
Finally, there is the Goal History window, providing a representation of economic concepts the 
student has successfully learned as well as those yet to be learned. 

In order to optimally Induce the lawful regularities In this environment, a model pequence 
of Iterative behaviors in Smithtown would involve: exploring the world (informally), developing 
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a plan for investigation (more formally), choosing online tools or techniques for executing the 
plan, collecting and recording data from the experiment, organizircg the results, seeing if the 
data confirm or negate prior beliefs, constructing a problem representation, modifying the problem 
based on discrepant results, refining the problem based on additionai information, recognizing 
discrepancies between the result and expectations, testing findings in additional realms, and 
finally, generalizing a principle or lav/. However, people differ In their application of these skills. 
Individual differences vi/ill be discussed in the next section. 



III. EXPERSEVIENT 1: EXPLORATORY STUDY OF 
fNDIViDUAL DIFFERENCES AND LEARNING 

The tv.^0 main research questions underlying this investigation were: (a) Did individuals 
interacting with Smithtown acquire as many economic concepts as students from a traditional 
classroom environment? and (b) In terms of specific "learning Indicators." what are the 
characteristics of those individuals who were more successful In learning in this type of 
environment as comparbvi to those less successful? The data source for the first quesiion 
corresponded to scores on a battery of pretests and posttests of economic knowledge developed 
by an economics expert working on the project. The data sources corresponding to the second 
question were detailed computer history lists of all student actions as well as verbal protocols 
from each student about Justifications for each action. These data were used In computing 
values for the learning indicators for each person across three 2-hour sessions with the tutor. 

This exploratory study of individual differences and learning took a problem solving perspective. 
Sternberg (1981) made a distinction between two forms of metacognitlon In problem solving: 
global planning and local planning. Global planning refers to a strategy that applies to a set 
of problems and does not focus on the characteristics of a particular problem. Local planning 
refers to a strategy that Is sufficient for solving a particular problem within a given set. Sternberg 
finds that better reasoners spend more time in global planning of a strategy for problem solution 
and relatively less time In local planning. Similarly, in studies of writing, Hayes and Flower 
(1986) point out that experts attend more to global problems than do novices. Novices focus 
on the conventions and rules of writing while experts make more changes that affect the 
meaning of the text. In physics, differences in problem solving between novices and experts 
also relate to surface and deep problem representations (Larkin, McDermott, Simon, & Simon, 
1980; Simon & Simon, 1978). Experts work In a more top-down manner indicating that a 
general solution plan is in place before they begin the manipulation of specific equations while 
novices approach problems In a more bottom-up manner, manipulating equations to solve the 
unknown. 

These findings indicate that individual differences in inductive problem solving can b0 defined 
In terms of the global and local aspects of performance, or attention to specific versus more 
general features of tlie problem solving task. In a discovery learning environment, following 
findings by Klahr and Dunbar (1988). this distinction may be translated to data-driven performance 
In contrast to behavior which is more hypothesis-driven. In the Smithtown environment/task, 
an individual obtains data (either self- or computer-generated). On the basis of these data, the 
individual then induces generalizations or hypotheses which drive further data collection, data 
organization and experimentation. Based on the literature cited above. It was anticipated that 
good reasoners might display hypothesis-driven performance earlier in their discovery activity, 
and use their hypotheses as performance goals In contrast to more sustained but indiscriminate 
data collection. For example, a good subject may plan to test the effect of changing the 
population of Smithtown on the demand for donuts. hypothesizing that If population Increased, 
the demand for donuts would increase. He or she would record baseline data on donuts. 
make the desired change to population, record the new data, and compare these data. A less 
pianful subject may similarly change the population and look at the data within the donut 
market, but without a higher level goal or hypothesis in mind. Data-driven induction Is not 
completely unacceptable since individuals come to the task (Smithtown) with preconceived 
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notions of regularities in the world of economics and they manipulate data and experiment on 
the basis of their a priori hypotheses. So, the discovery process studied here does Involve 
some combination of data-driven induction and hypothesis-generated data which guide 
performance. 



IV. (^.ETHOD 



Subjects 

Thirty undergraduate students enrolled at the University of Pittsburgh participated In this 
study. None had any formal economics training or previous economics courses The age 
range was from 18 to 25 years and there was about an equal distribution of males and females 
in tills sample. Subjects comprising the Smithtown and control groups were obtained from 
responses to campus advertising and paid for their participation. Subjects enrolled in an 
introductory economics class volunteered to participate. All subjects were Informed about the 
purpose of the experiment at its conclusion. 



Procedures 

Three groups of subjects were used in the study: (a) students who received classroom 
instruction on introductory economics, (b) a control group which received no economics 
instruction, and (c) students interacting with Smithtown, There were 10 subjects per group. 
All subjects took a pretest battery of economic concepts, received their respective Interventions, 
and then took the posttest battery. The elapsed time between test batteries was about equal 
for all groups (i.e., about 2 weeks). The economics classroom group had two and one weeks 
of instruction on the issues of supply and demand, the control group simply returned in 2 
weeks for the posttests (no Intervention) and the Smitlitown group spent, Individually, about 5 
hours Interacting with the system, broken down Into three sessions across 2 weeks. 

The chapters covered by the economics class during the treatment phase corresponded to 
the identical material/curriculum covered by the group working with Smithtown (i.e., the same 
introductory economic principles involving the laws of supply and demand in a competitive 
market). 

Prior to their first real session with the system, the group using Smithtown were given a 
Guide to Smithtown. This three-page booklet Informcxi them of their goal (i.e., to discover 
principles and laws of economics) and how to best achieve that goal (I.e., to imagine themselves 
as scientists, gathering data and forming and testing hypotheses about emerging economic 
principles and laws). The Guide overviewed some of the online tools available in Smithtown 
with examples provided on how to use them. Finally, the Guide emphasized that the individual 
would probably make errors or get stuck, but to try to learn from the mistakes. A glossary 
of terms (e.g., mouse, menu) concluded the Guide and the students were free to take It home 
with them between sessions. The Guide did not contain any Information about economics 
principles. 

The test battery used In this study was developed by the author, in conjunction with an 
economics Instructor at the University of Pittsburgh, The battery consisted of two tests, multiple 
choice and short answer, and parallel forms were constructed for pretests and posttests. The 
tests were pilot tested to ensure clarity of instructions, proper timing, and the appropriate level 
of difficulty. After test development, the batteries were reviewed by an independent economics 
Instructor for content validity (i.e., completeness and accuracy). 
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V. RESULTS 



Group Comparisons 

The group means by testing occasion (pretest, posttest) and test type (multiple choice, 
short answer) are presented In Table 1 and Figures 7a and 7b. First note in Table 1 and 
Figure 7 that the three groups did not appear to differ on their pretest scores which assessed 
Incoming economics knowledge (around 50% accuracy on both pretests). A post hoc MANOVA, 
computed on data from the two pretests, confirmed this observation: £4,54 = 0.49; £ = 0.74! 



Table 1 . Percent Correct on Pretests and Posttests 







Control 






Classroom 




Smithtown 




MC 


SA 


AVG 


MC 


SA 


AVG 


MC 


SA 


AVG 


Pretest 




















Mean 
SD 

Posttest 


46.0 
11.5 


54.0 
19.2 


50.0 


46.8 
9.2 


50.0 
12.8 


48.4 


48.0 
11.5 


47.0 
13.1 


47.5 


Mean 
SD 


54.8 
14.6 


59.7 
17.2 


57.3 


64.0 
9.4 


85.7 
8.5 


74.8 


61.0 
11.6 


84.3 
6.3 


72.7 



Note. MC represents the percent correct on the Multiple Choice tests and SA represents the 
percent correct on the Short Answer test. AVG Is the average of these two scores. 




^'gure 7a. Experiment i: Pretest and Figure 7b . Experiment 1: Pretest and 

postest data from siiort answer test postest data from multiple choice test 

(by groups). (by groups). 
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The primary hypothesis of this study was that fostering the use of *^.peciflc Inquiry skills 
should facilitate the learning of specific domain knowledge; in this case, economics. This was 
tested by the Interaction between testing occasion (I.e., pretest versus posttest for both the 
multiple choice and short answer tests) and the Instructional treatment (I.e., control, classroom, 
and Smithtown). That Is, did the three groups perform differently from pretest to posttest? 
This Interaction was significant when a MANOVA was computed on these data; £4,54 = 5.66; 
£ <.001. 

Subsequent analyses were also conducted on planned comparisons between the different 
treaf.ient groups. The first comparison, between the experimental group {Smithtown) and the 
classroom group was not significant; F2,26 = 0.36; g = .70, implying that the two groups did 
not differ on relative pretest to posttest improvements. However, the comparison between 
treatment (classroom and Smithtown) and control was significant; F2,26 = 16.86; g < .001. 
Thus, the classroom and groups showed equivalent improvements and greatly exceeded the 
performance of the control group. 

When the data were analyzed separately for each test type (multiple choice and short 
answer), the comparison of classroom and Smithtown groups versus the control group revealed 
no significant differences for the multiple choice test^ (Fi,27 1.63) but a significant difference 
for the short answer test (Fi ,27 = 34.94; g < .OOiy! That is, the instructional treatments 
apparently had their greatest effect on the cognitlvely complex task of recalling and articulating 
economic concepts (e.g., List as many important factors as you can eausSng the demand 
curve for a good or service to shift to the left or right) as opposed to the cognitlvely simpier 
task of choosing a correct response from alternatives. 

Thus, the Smithtown group, with considerably less time on task, performed the same as 
students In the traditional classroom environment on tests of economic concepts, it Is Important 
to note that although the economics classroom group received almost twice as much 
Instruction/exposure to the subject matter as did the Smithtown group (i.e., about 11 hours 
versus 5 hours, respectively), the groups did not significantly differ on their posttest scores. 
Moreover, the system did not tutor economic knowledge directly. Rather, the tutorial assistance 
was In terms of directing the subjects' scientific skills. 

Of particular Interest to this research was the behavior within the Smithtown group. In other 
words, what did the more successful subjects do differently than the less successful subjects 
In terms of specific learning behaviors? For example, maybe the successful subjects were 
simply more active In the environment, or recorded data into their notebooks more conscientiously, 
or perhaps the effective subjects generated more testable hypotheses compared to the less 
effective subjects. 



Analysis of Successf u l and Unsuccessful Learning Behaviors 

I examined whether the low level behavioral Indicators relating to "activity level" differentiated 
effective from Ineffective learners, compared with the higher level indicators relating to "data 
management" and "thinking and planning" skills (see Appendix A for a listing of all learning 
indicators). These learning process data (or learning Indicators) thereby allow the capture of 
learning In progress and the examination of precise behaviors that yield more from less successful 



^ In Rgure 7b, the "adjusted percent correct" was used for the multiple choice test data to adjust the mean score for 
guessing: Number right - (Number wrong/number of aiternative choices • 1). 
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performance In this guided discovery environment. The theoretical framework underlying this 
research suggests a major role for the higher level Indicators. 

Table 2 presents the 10 experimental subjects with their pretest and posttest scores. Of 
Interest were those Individuals scoring: (a) low on the pretests but high on the posttests (more 
successful), and (b) low on the pretests and lov/ on the posttests (less successful). I was not 
Interested In Individuals who scored high on both the pre- and the posttests as that would 
imply some domain-related incoming knowledge. Only one Individual (CR) fell Into this category. 

As seen In Table 2, there are five subjects with large gain scores and four subjects with 
smaller gains on the economic test batteries. Having made this distinction between "more" and 
"less" successful learners in Smithtown, these two groups can be compared and contrasted by 
their performance indicator data. 



Table 2 . Smithtown Subjects' Scores on the Economic Tests Combined: 
Multiple Choice find Short Answer Tests 



Subjects 


Pre 


Post 


Gain 


Large Gain 








(above mean) 








CF 


40.2 


83.4 


43.2 


BW 


53.7 


89.9 


36.2 


JH 


42.7 


73.9 


31.2 


ML 


54.2 


84.4 


30.2 


CS 


42.7 


70.4 


27.7 


Small Gain 








(below mean) 








SS 


53.3 


75.4 


22.1 


JS 


54.2 


75.4 


21.2 


HT 


43.1 


56.8 


13.7 


OY 


51.7 


69.4 


17.7 


Constrained Gain 








(high on Pre and 








Posttests) 








CR 


77.8 


84.4 


6.6 


Overall Mean 


51.0 


76.0 


25.0 


SD 


12.0 


10.0 


11.0 



Data for each subject were collapsed across their three sessions with Smithtown Into a 
single index for each of 30 learning indicators. The indicators can be broken down into three 
rational categories: (a) general activity level Indicators, (b) data management skills, and (c) 
thinking and planning behaviors. Each of these broad categories encompasses multiple Individual 
indicators (see Appendix A). 

Two data sources were used to compute the performance indices: detailed computer history 
lists of ail student actions, and verbal protocols from each student about Justification for each 
action (l.e., what they expected to see after a particular action, and what their plans were for 
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further experimentation). Each person's Indicator scores were thus tallied, then standardized 
in relation to the other subjects. 



As expected, more and less successful sub|ects differed mostly on performance measures 
relating to thinking and planning skills (i.e., the category representing the most complex learning 
Indicators and reflecting effective experimental behaviors). There were fewer, Ljt substantial 
differences on Indicators from the data management skills category. Indicators from the activity 
level category did not discriminate between the two sets of subjects (see Shute, Glaser, & 
Raghavan 1989 for a complete discussion of these results). "Differences" In this context were 
defined as at least one standardized unit between the two groups per Indicator. The particular 
indicators that best differentiated subjects were as follows (ordered from most to least 
differentiating): 

Generalization. The more effective subjects would test their developing economic beliefs In 
different markets to see If they were upheld, while the ksss effective subjects typically would 
not initiate experimentation across markets. These behaviors are represented by indicators 22 
and 23 (In Appendix A) and involve both generalizing emerging principles to related markets 
(eg Investigating the effects of a manipulation on substitute or complementary goods) or 
testing beliefs out in unrelated markets to see the limits and extent of a particular concept. 
Since some of the town factors have global effects and some have only limited effects, it is 
good scientific practice to try out things in various markets. For Instance, changing the prevailing 
interest rate in Smithtown would affect the demand for large cars but not for Ice cream, while 
changing the population would Impact the demand in relation to both markets. 

Comploxity of Experiments. Effective subjects also completed more actions within a given 
experiment and Investigated fewer markets overall compared to the less effective subjects 
(indicators 29 and 4). These behaviors refiecied the richness and tenacity of an Individual's 
actions within an experiment. A thorough, systematic Investigation of a concept was Indicated 
by more connected actions (e.g., repeatedly changing the price of a good uniii the market 
reached equilibrium); aimless behavior was Indicated by fewer connected actions. Furthermore, 
across the three sessions, the more successful subjects' number of actions per experiment 
increased showing that their experiments became more complex as they gained additional 
domain knowledge. This was not the case with the less successful group. 

SysterDatic Variable Changes. Indicator 28 measured the number of variables manipulated 
per experiment. Given the freedom of the environment, it could be tempting to make changes 
to multiple variables concurrently; however, ensuing results would thereby be obscured as to 
what caused the state of market affairs. The biggest problem for the less successful subjects 
was that they persisted in changing multiple variables simultaneously. The more successful 
subjects changed fewer variables at a time, typically Just a single variable. 

Adequate Data Collection. Another discriminating Indicator from the thinking and planning 
category Involved collecting sufficient amounts of data before making a generalized hypothesis 
regarding any of the economic concepts (Indicator 24). Good scientific methodology Involves 
generalizing a concept based on enough examples or Instances of a phenomenon rather than 
on inadequate data which may include elements of chance or confounding variables. The 
sufficiency criterion was set as having at least three related rows of notebook entries before 
using the hypothesis menu. The more successful subjects did noi attempt to make general 
hypotheses prior to collecting enough data on a given concept while the i ss successful subjects 
were content to make impulsive generalizations based on inadequate data. 

Planning an Experiment. Higher-level planning behavior (Indicator 20) was demonstrated by 
the more competent subjects. They tended to set up an experiment and execute it to completion. 
Actions corresponding to this Indicator involved selecting variables from the planning menu, 
then utilizing those variables in subsequent controlled manipulations. This type of higher-level 
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planning was rarely evidenced among the less successful subjects. This result is in accord 
with Sternberg's (1981) findings that persons scoring high on reasoning tests spent mo/e time 
than low scoring persons on global planning, and less time on local planning. Similarly, 
Anderson (1987) Investigated individual differences in students' solutions to Lisp programming 
problems and found that the poorer students tended to be less planful In thei; problem solving 
activities compared with better subjects. 

Predicting Experimental Outcomes. The proficient subjects In this study tended to make 
more outcome predictions for their experiments, while the less proficient subjects made 
considerably fewer predictions (indicator 16). To illustrate, for an experiment Involving the 
increase of the price of gasoline, a valid prediction could be rendered that "Quantity demanded 
of gasoline will decrease." The more effective subjects stated their predictions on more occasions 
than the less effective subjects. However, there was no way of determining whether the fewer 
predictions of the less effective subjects were due to knowledge deficiency or to a general lack 
of motivation. 

Notebook Entries. In terms of data management skills, this study revealed that the successful 
subjects made more notebook entries, overall, compared with the less effective subjects (indicator 
9). In addition, those entries tended to be more consistent with, and relevant to, the focus of 
their investigation (indicator 13). For example, notebook entries were typically made by the 
proficient subjects following any variable changes, and the entered variables usually had been 
selected beforehand in the planning menu as those of interest. In addition, proficient subjects 
collected baseline data into their notebooks (I.e., values of variables before being altered), 
something the less proficient subjects generally failed to do, rendering later comparisons to 
changed data very difficult. 

Relation of Successful Learning to Prior Scientific Training 

In addition to comparisons between subjects based on their standardized indicator values, 
demographic information was obtained from all 10 subjects. Two questions concerned previous 
scientific training: (a) what science courses the subject had taken since high school, and (b) 
what his or her major was. According to a hypothesis that different backgrounds caused the 
observed differences in scientific behaviors, it would be expected that the less successful 
subjects took fewer science courses. This was not the case. The less successful group had 
taken considerably more science courses since high school (total = 27) compared to the more 
successful group (total = 8). Moreover, the more and less successful groups had the same 
number of declared science majors (i.e., three per group). Thus, differential exposure to science 
training did not seem to determine who demonstrated better scientific behaviors. 



VI. EXPERIMENT 2: LARGE SCALE STUDY OF 
LEARNER DIFFERENCES IN SMITHTOWN 

The previous study found some evidence for individual differences in learning and discovery 
strategies. I next addressed two main questions. First, what is the relationship between general 
intelligence and learning outcome (i.e., knowledge and skill acquisition from Smithtown), and 
second, do the findings from Experiment 1 generalize to a large sample from a different 
population? Experiment 1 tested the effectiveness of the system In comparison to economics 
learned in a traditional classroom environment and additionally found some areas differentiating 
more from less successful individuals in learning from the system. In Experiment 2, I included 
a measure of general intelligence in the analyses to examine the nature and range of individual 
differences in learning. In particular, how much of these differences are attributable to general 
intelligence (or general ability)? Is it, simply, an Individual's general Intelligence that determines 
the nature and range of what they will learn, or it something more, such as specific behaviors 
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and strategies which are trainable (unlike general Intelligence which Is believed to be more fixed 
and Inflexible)? 

As part of the Learning Abilities Measurement Program (LAMP) at the Air Force Human 
Resources Laboratory, a group of 530 subjects were tested with a nu fied version of Smithtown 
which automatically tallied and summarized performance Indicators at the end of a 3 1/2-hour 
session (Instead of about 5 hours with by subjects In the exploratory study). 



VII. METHOD 



Subjects 

Subjects consisted of 530 enlisted Air Force recruits on their 6th day of basic training at 
Lackland Air Force Base, Texas. The gender distribution of subjects was approximately 3/4 
males and 1/4 females. All subjects were between the ages of 17 and 27 years and had high 
school (or equivalent) educations. 



Procedure 

Subjects were given a briefing prior to the tutor which informed them of their "mission" (i.e., 
to manipulate the environment, acting as scientists, and to try to learn as many concepts as 
possible regarding basic laws of microeconomics). A short 5 minute game preceded Smithtown, 
designed to familiarize them with the mouse and menus. They next read an online Guide to 
Smithtown, saw a demonstration of a simple, online experiment, then entered the hypothetical 
marketplace on their own. 

The number of concepts learned was the criterion measure (i.e., principles and laws correctly 
stated to the system via the hypothesis window). There were 12 concepts that could have 
been learned, and the subjects* criterion data ranged from 0 to 6. Since there was only 3.5 
hours allotted for interaction, and the first hour or so was typically spent familiarizing oneself 
with the environment, it was not surprising that the maximum number of concepts learned was 
only six. 

A measure of general intelligence was available for each subject. The Armed Forces 
Qualification Test (AFQT), is a composite score derived from the Armed Services Vocational 
Aptitude Battery (ASVAB, Department of Defense, 1984) consisting of the subtests: arithmetic 
reasoning, word knowledge, paragraph comprehension and numerical operations. 



Vlli. RESULTS 



Cluster Analysis on Performance Indicators 

A hierarchical cluster analysis was computed based on the correlation matrix of the learning 
indicators to reduce the number of Indicators, described In Experiment 1, to a more manageable 
set, and also to test alternative, objective groupings of the Indicators (rather than the more 
subjective, rational categorization used In Experiment 1). This procedure was not employed in 
Experiment 1 because the small number of subjects would have made the results unreliable. I 
employed the ADDTREE/P clustering program (Corter, 1982) which Implements Sattath and 
Tversky's (1977) Additive Similarity Tree model because ADDTREE/P has been shown to fit 
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empirical data particularly well, especially discrete data such as the learning indicators. Applying 
the ADDTREE program to the 27 x 27 correlation matrix of indicators yielded a moderately 
good fit (r2 = .71; stress = 0.06), but more importantly, yielded a highly interpretable solution. 
(Note: Three of the original 30 indicators were not included in this matrix as they had excessive 
interdependence or "communality" with other variables; thus they were removed.) At the top 
level, the indicator data formed three main clusters: (a) Basic Activities, (b) Data Management, 
and (c) Scientific Behaviors. These three clusters further decomposed at the next level down. 
Basic Activities were subdivided into: (1) busy or undirected activities, and (2) directed activities; 
Data Management subdivided into: (3) notebook usage, and (4) other tool applications; and 
Scientific Behaviors were subdivided into: (5) data-driven inquiry, (6) organizing experiments, 
and (7) hypothesis-driven inquiry behaviors. Note that this cluster analysis solution confirms 
the rational specification of learning behaviors posited in the design phase of the system (Shute, 
Glaser, & Raghavan, 1989). The cluster analysis solution is shown in Figure 8, and is characterized 
below by the contributing variables. 
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Figure 8 . Hierarchical cluster analysis solution of learning indicators. 
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1. Undirected Activities : This cluster is defined by the number of variables changed at 
one time (VCPERTM), total number of independent variables changed (INDVAR), number of 
variables changed that were specified in the planning menu (PMVC), number of notebook entries 
of changed independent variables (EClVAR), average number of variables changed per experiment 
(VARCHAN), and the average number of actions taken within a particular experiment (AVGACTS). 

2. Directed Activities : This cluster Is defined by the number of times price changes were 
made (PCHANGE), number of times the market sales information window was viewed (MSINFO), 
the total number of notebook entries made O'OTALNB), and the number of times the experimental 
frameworks were used to direct experiments (EXPFRAME). 

3. Notebook Usage . This cluster is defined by the number of times data from past 
experiments was Inserted Into the notebook (REINSERT), number of notebook entries of variables 
that had been specified In the planning menu (PMNB), and number of times the market data 
history window was viewed to see past variables and associated values (MDWINDOW). 

4. Tool Usage : This cluster is defined by the number of times the table package was 
applied (TABLE), number of times the graph package was used (GRAPH), number of markets 
Investigated (MARKETS), number of times baseline data was entered into the notebook (BDENTRY). 
and number of times baseline data was observed (BDOBSERV). 

5. Data-Driven Experiments : This cluster is defined by the number of specific predictions 
made of an experimental outcome (PREDMADE), and number of limes an experiment was 
replicated (REPLIC). 

6. Organizing Experiments : This cluster is defined by the number of times the computer 
was requested to make price adjustments toward an equilibrium state (CPCHANGE), and number 
of times the planning menu was used to organize an experiment (PLANMENU). 

7. Hypothesls-Drlven Experiments : This cluster is defined by the total number of hypotheses 
made (TOTHYPO), number of times sufficient data v^'as recorded prior to rendering a hypothesis 
(ENUFDATA), number of times findings were generalized to unrelated markets (GENUN), ratio 
of the number of correct hypotheses made divided by the total number of hypotheses (GORRHYPO), 
and the number of times findings were generalized across related markets (GENREL). 

Most of the clusters are readily interpretable. However, the distinction between Cluster 5 
(data-driven experimentation) and Cluster 7 (hypothesis-driven experimentation) requires some 
elaboration. When a person conducts a local experiment (e.g., Increases the number of compact 
car suppliers in Smithtown) and renders a specific prediction about the ramifications (e.g.. the 
price of compact cars will go down), it Is characterized as data-driven experimentation. This 
contrasts with a more general or hypothesis-driven experiment where an Individual will attempt 
to generalize specific, local findings across different markets (e.g.. Investigating the relationship 
between price and quantity demanded In the gasoline, lumber, and ground beef markets), 
inducing a general principle operating In a competitive market. 



Correlational and Regression Analyses 

Seven composite scores, one for each major cluster category, were computed for each 
subject by summi:ig standardized Indicator scores within each cluster. The correlations of these 
variables and the criterion measure (number of concepts learned) can be seen In Table 3. 




Table 3 . Correlation Between Composite Indicators 
and Nunnber of Concepts Learned 



Composite 




performance 


Number of 


indicator 


concepts learned 


Gross Activities 


-.08 


Directed Activities 


.05 


Notebook Usage 


-.12* 


Tool Usage 


-.08 


Data-driven Behaviors 


.03 


Organization 


.06 


Hypothesis-driven Behviors 


.65** 



N = 530. 
*£ < .01. 
< .001. 



From these data, it is apparent that the indicators relating to hypothesis-driven behaviors 
(i.e., the effective scientific inquiry skills) were the most highly correlated with successful learning, 
in addition, spending too much time managing the online notebook showed a small yet significant 
negative effect on subsequent learning. 

Regression analyses of these data were computed testing full and restricted models. First, 
all seven variables and two way interactions were tested (full model) predicting the criterion of 
number of concepts learned. This resulted in a multiple R = .70. Next, a backward elimination 
of the interactions was performed, and only three interactions remained in the equation (multiple 
R = .69). Finally, a regression analysis with backward elimination of the main effects was 
performed, and the results included the following main effects and interactions in the equation 
(multiple R = .69; F7,522 = 66.81, £ < .001): Undirected activities, Directed activities, 
Organization, Hypothesis-driven behaviors. Organization by Hypothesis-driven behaviors, 
Undirected activities by Hypothesis-driven behaviors, and Directed activities by Hypothesis-driven 
behaviors. 

The three significant two way interactions are characterized as follows. The interaction 
involving the variables: Organization and Hypothesis-driven behaviors (t = -4.3; £ < .001) 
showed that if a person had a low value for hypothesis-driven behaviors, he or she would 
benefit (i.e., learn more concepts) from organizing and planning experiments. On the other 
hand, If a person had a high value for hypothesis-driven behaviors, less time spent planning 
and organizing, and more time spent actively and systematically conducting experiments was 
better as far as learning more concepts. The significant interaction involving Undirected activities 
by Hypothesis-driven behaviors (t = -4.9; £ < .001) showed a similar pattern where, for low 
values of hypothesis-driven experimentation, a person slightly benefited from more activities in 
the environment, but for higher levels of hypothesis-driven behaviors, more focused behaviors 
led to the acquisition of the subject matter. A different pattern Is seen with the interaction of 
the variables: Directed activities and Hypothesis-driven behaviors (t = 2.90; £ < .01). If a 
person did not act in a hypothesis-driven manner, engaging in more directed actions was not 
helpvul in learning economic concepts. However, if a person was more hypothesis-driven, he 
or she would benefit from directed activities carried out in conjunction with scientific behaviors. 
Although these interactions are Interesting, they only account for about 4% of the variance In 
the dependent measure while the majority of variance (42%) Is explained by the single variable: 
Hypothesis-driven behaviors. 
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The mean Armed Forces Qualification Test (AFQT) percentile score for this sample of 530 
subjects was 64.9 with standard deviation of 17.3. These data are in accord with data calibrated 
on national probability samples of 16 to 23 year old American youth (OASD/MRA&L. 1982). 
The simple correlation between the measure of genera! aptitude, the AFQT, and number of 
concepts learned (r - .18; £ < .01) indicates that some amount of general intelligence is 
implicated in the learning outcome. However, when AFQT was included in a regression analysis 
involving the variables discussed above, the amount of unique variance accounted for by AFQT 
in predicting the number of concepts learned was less than 1%, compared to 38% of the unique 
variance attributable to hypothesis-driven behaviors (Cluster 7). Thus, while general intelligence 
is certainly a component of learning, specific scientific behaviors account for considerably more 
direct variance in the criterion measure. 

Another question asked by this research concerned the correlation between each of the 
composite variables and general inte.iectual ability In other words, which behaviors did the 
subjects with higher AFQT scores engage in during Smithtown interactions? These correlations 
are shown in Table 4. 



Table 4 . Correlation Between Composite Indicators 
and General Aptitude (AFQT Score) 



Composite 




performance 


AFQT score 


indicator 


Undirected Activities 


.06 


Directed Activities 


.27** 


Notebook Usage 


-.07 


Tool Usage 


-.10 


Data-driven Behaviors 


.13* 


Organization 


.07 


Hypothesis-driven Behaviors 


.24** 



N = 530. 
< .01. 
**£ < .001. 



The pattern of correlations suggests that the high ability Individuals engaged in directed, 
systematic activities, approaching the task in a manner concurrently bottom up (data-driven) 
and top down (hypothesis-driven). This was achieved by first conducting local experiments, 
then gradually expanding the scope of the findings across markets to test and refine developing 
hypotheses. It may have been that subjects' high general ability enabled them to collect local 
data while having a goal state in mind. 



Cluster Analysis on Cases 

In contrast to a cluster analysis of variables, a cluster analysis on cases can detect consistent 
patterns or styles of interacting with Smithtown and the effectiveness of these approaches as 
far as ultimate knowledge acquisition. For example, someone may adopt the more obsessive 
approach of changing variables and conscientiously recording all values In the online notebook, 
regardless of relevance. This style contrasts with a more systematic approach of generating 
a hypothesis about some variable relationships, making the appropriate change(s), recording 
only the relevant data, and observing the results of the change. 
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A cluster analysis on cases (i.e., subjects as opposed to variables), allocates Individual 
cases to clusters, classifying them based on the (squared) euciidean distances between cases 
and clusters. Each case is assigned to the cluster for which Its distance to the classification 
center is smallest. This analysis was performed v;lth respect to the three higher level composite 
variable: Activities, Data Management and Scientific Behaviors. The cluster analysis produced 
five distinct clusters of subjects, shown in Table 5. 



Table 5 . ClluGter Solution of Composite Learning Behaviors 



Cluster 


1 


2 


3 


4 


5 


ACTIVE 


.50 


-.35 


-.31 


1.15 


.98 


DATA MGT 


-.19 


.33 


-.23 


1.85 


-.17 


SCIENTIFIC 


-.11 


-.23 


.32 


0.10 


.99 


N 


170 


183 


153 


11 


13 


No. Concepts 


.21 


.14 


.78 


.0 


1.38 



The different dusters of subjects were then compared in terms of the criterion measure. 
An ANOVA was performed on the data with number of concepts learned^ as the dependent 
variable, and the five cluster groups as the independent variables. Groups differed significantly: 
F4.525 = 31.10; £ < .001. As seen in Table 5. the group learning the most concepts (i.e., 
cluster 5, N = 13), was characterized by having relatively high effective scientific behaviors 
and activities. The group learning the least concepts (i.e., cluster 4, N = 11) engaged In high 
activity and data management behaviors but fewer scientific behaviors. 

To test the hypothesis that engaging in only scientific behaviors is a sufficient condition for 
success In this type of environment, post hoc comparisons were computed testing the difference 
between Cluster 3 (the group evidencing only scientific behaviors) and the other groups. In 
four contrasts (I.e., Clusters 1 and 3, 2 and 3, 4 and 3, and 5 and 3). the subjects in Cluster 
3 learned significantly more concepts than all other groups, except those in Cluster 5. Thus, 
the subjects learning the most from Smithtown were those that engaged in scientific behaviors 
and were. In general, active in their environment, albeit. In a directed manner. The less 
successful Individuals In Smithtown (e.g., Cluster 4) spent most of their time managing data, 
busily occupied in a less directed manner, and not being very scientific during the learning 
process. 

When the two groups learning most and least (Clusters 5 and 4, respectively) are compared 
on their profiles, both groups can be seen as having high loadings on the "activities" variable. 
Since the three groupings are orthogonal, an Interpretation of this pattern is that when learners 
engage In any of the Indicators tallied under "activities." they will be successful in Smithtown 
only if they are goal or hypothesis-driven, conducting experiments that are systematically planned 
and executed. This stands In contrast to those being ^'active" only with the goal state (local 
level) of having their data be arranged neatly. 



The mean number of concepts learned per group was fairly low. This may be due to several factors: the time on the 
system was very short (about two hours), the population was different (i.e.. basic recruits as compared to university students) 
and the system did not always recognize some of the alternative representations of concepts, thus may not have tallied the 
concept as being learned. 
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In summary, Experiment 2 found significant differences in l<nowiedge outcome that were 
directly related to hypothesis-driven behaviors. When a measure of general intelligence was 
investigated in relation to the learning criterion, specific behaviors (i.e., those involved with goal 
or hypothesis-driven activities) were found to be much stronger predictors of successful learning 
In this type of environment than was the measure of general intelligence, which tends to be a 
more stable trait. Perhaps the measure of general intelligence (AFQT) exerted Its influence on 
learning indirectly through the inquiry beiiaviors. These particular scientific behaviors are 
presumably trainable if they can be specified into rules, which is what was done in the "inductive 
inquiry skills" knowledge base in Smithtown. 



IX. GEMERAL DISCUSSION 

In a computerized laboratory environment, students had the opportunity to engage in active, 
discovery learning of economic concepts by manipulating variables in a hypothetical town and 
seeing the repercussions. Overall, the system worked as expected: Tutoring on the scientific 
inquiry skills resulted In learning the domain knowledge as a by-product, evidenced in Experiment 
1 where performance on the posttest by Smithtown subjects was comparable to the performance 
by subjects from an introductory economics class. 

In general, it appears that in the rather complex task Involved (n these two studies, many 
of the behaviors that differentiated successful and less successful subjects are similar to those 
identified In previous studies v*^ith both laboratory and more realistic tasks (e.g., Klahr & Dunbar, 
1988; Shrager, 1985; Sternberg, 1985). Individual differences In performance from Experiment 
1 were primarily a function of the hypothesis-driven behaviors applied by the subjects during 
Smithtown Interaction. In particular, findings from Experiment 1 showed that the most effective 
learning behaviors were related to the category': Thinking and planning skills. Similarly, from 
Experiment 2, there was a strong correlation (r = .65) between the composite variable: 
Hypothesis-driven experimentation and the dependent measure: Number of concepts learned. 
The cluster analysis conducted on cases confirmed this finding whereby the two groups of 
subjects who learned the most concepts (I.e., Clusters 3 and 5) were set apart from the other 
groups by virtue cf their application of scientific behaviors. These subjects were interrogating 
the discovery world In a systematic manner, generating (top-down) and then testing (bottom-up) 
hypotheses about possible relationships among the economic variables. The less effective 
groups spent more time managing their data and doing other "local" activities In the environment. 

In summary, the successful individuals in both experiments employed more powerful heuristics 
compared to the less successful individuals. They manipulated fewer variables, holding variables 
constant while one variable was systematically explored. Less successful subjects did not seem 
to realize the power of this heuristic. Successful subjects took their time to generate sufficient 
evidence before coming to a conclusion while the less successful subjects were more impulsive 
and attempted to induce generalizations based on inadequate Information. The more effective 
subjects tended to think in terms of generalizing their hypotheses and explorations beyond the 
specific experiment or market they were working on. They conceived of a lawful regularity as 
a general principle and as a description of a class of events rather than a local description. 
These subjects were also morG sensifive to the existence of deeper explanatory principles in 
addition to local data descriptions; they appeared to realize that discovery was not only a 
function of data, but that they needed to generate some rule that could provide them with a 
goal for their actions. In this sense they tended to be more hypothesis-driven than the less 
successful subjects. 

In regard to inductive problem solving, as Greeno and Simon (1988) state and as Klahr and 
Dunbar (1988) describe the Interplay between rules and instances, the best learning strategy is 
a combination of bottom-up and top-down processing. For subjects from the two experiments 
described In this paper, this seemed to be the case: The better subjects would predict variable 




relationships and then test those hypotheses out, concurrently exploring and collecting data 
which led to further generalizations. Less effective subjects seemed to be limited to a more 
data-driven (or bottom-up) approach, often falling short of grasping the larger picture. This is 
in accord with findings from Investigations of novice-expert differences In problem solving (e.g., 
Larkin, McDermott, Simon, & Simon, 1980). Furthermore, the importance of higher level planning 
in this inductive discovery environment is in agreement with studies of individual differences in 
reasoning tasks (e.g., Sternberg, 1985). Successful subjects consistently planned an experiment 
and then executed it to completion, according to plan, in sharp contrast to the more haphazard, 
less planful approach applied by less successful subjects in their experimental methodologies. 

In Experiment 2, there was a significant correlation between the composite variable: 
hypothesis-driven behaviors and a general intelligence measure: AFQT score (r == .24; g < 
.001). This implies that the brighter individuals in the sample of 530 tended to be more 
systematic and controlled in their learning behaviors than those with lower AFQT scores. 
Furthermore, the correlation between AFQT score and the learning criterion was r = .18 
implicating general intelligence in the final learning outcome. However, AFQT score only 
accounted for a small proportion (< 1%) of the learning outcome variance while the specific 
indicators, subsumed under the variable: hypothesis-driven behaviors, accounted for a much 
larger proportion of outcome variance (38%). The importance of these findings for instruction 
are that the particular scientific behaviors outlined in this paper (e.g., generalizing concepts 
across different markets, collecting sufficient instances of a phenomenon prior to stating a 
hypothesis, etc.) can be trained and hence, individuals can learn to be more methodical and 
scientific, thereby leading to the induction of general principles. 

Learning from any complex environment is believed to represent a four way interaction 
involving: (a) the subject matter or curriculum, (b) the instructional environment (e.g., discovery, 
didactic), (c) the desired knowledge outcome (e.g.. mental model, automatic skill), and (d) 
learner style (e.g., passive versus active, holistic versus analytic processing) (see Kyllonen & 
Shute, 1989, for a complete discussion of this interaction). In terms of these four dimensions, 
Smithtown may be characterized as follows. First, the subject matter is microeconomics as 
well as scientific inquiry skills. Second, the instructional environment is a guided discovery 
environment where tutorial assistance is -^n the inquiry skills, not economics knowledge. Third, 
the desired knowledge outcome is a me ^al model of how the laws of supply and demand 
operate in a competitive market and also i :>\n to systematically conduct experiments to extract 
the various laws and relationships. Finally, earner style was free to vary so that optimal and 
suboptimal behaviors in this environment could be determined. 

For this type of environment, knowledge outcome, and subject matter, the most optimum 
learner behaviors found from the two experiments are systematic, hypothesis-driven activities. 
What about those subjects who are not characterized by these attributes? One way in which 
an intelligent tutoring system can increase its effectiveness Is to adapt to an individual's strengths 
and weaknesses. In the case of Smithtown, this would take the form of providing more guidance 
for those less scientifically oriented on the particular skills determined to be Important to learning 
from Smithtown. Since this system, as implemented in both experiments, was more discovery 
learning than guided, the more effective subjects were more self-directed and scientific. To 
optimize learning for all subjects, additional guidance, at least in the beginning sessions, is 
required for the less scientific persons. To make the program more flexible (i.e., to adapt its 
level of guidance based on subject behaviors) one additional rule could be Inoorporated Into 



^For both experiments discussed in this paper, we set the threshold relating to the coach's intervention fairly high. That 
is, a subject needed to demonstrate 3 buggy behaviors or errors of omission before the coach would provide feedback. This 
threshold Is modifiable and alternative environments may be created by manipulating the threshold value (e.g., turn it off 
completely for a discovery world, or set it to 1 to give immediate feedback). 



35 



the "teaching strategies" module. This rule could check the student model (i.e., the "batting 
averages" per critic) for evidence of students' buggy or floundering behaviors, then Intervene 
with immediate feedback until the behavior in question was no longer being demonstrated. 
Statistics are already maintained by the system In the student model on the frequency of 
unsystematic behaviors, thus the real-time adjustment of the current threshold of intervention 
would provide for additional tutoring on those inquiry skills that were most difficult. 

In conclusion, two studies have been described of individual differences in learning from an 
exploratory environment. Although in both studies the tutor only assisted on procedural problem 
areas (i.e., those related to various scientific inquiry behaviors), subjects did seem to extract 
domain knowledge during the course of their investigations and experimentations within Smithtown. 
Tutoring on the scientific inquiry skills did result in learning some principles and laws of 
microeconomics. Although there was not enough information from the larger study about a 
subject's prior knowledge of economics to make a valid treatment effect statement, that information 
was included in the smaller study (i.e., all subjects were selected on the basis of having no 
format economics background). 

Some of the skills and behaviors v/hich are important to scientific discovery have started 
to be delineated, and the behaviors identified in this paper agree with the findings from related 
research (e.g., Klahr & Dunbar, 1988, Langley, Simon, Bradshaw & Zytkow, 1987). In addition, 
these specific behaviors relate to individual differences found in general studies on problem 
solving, concept formation, and so on. From an instructional perspective, the behaviors denoted 
in this paper can consequently serve as a focal point for relevant intervention studies. From 
a design perspective, findings from these studies suggest changes to intelligent tutoring systems, 
in general, so that they may be more like the individualized teaching technologies they have 
the potential to be. 
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APPENDIX: ORIGINAL 30 LEARNING INDICATORS 



General Activity Level 

1. Total number of actions. 

2. Totai number of experiments. 

3. Number of ciianges made to the price of the good. 

4. Number of markets investigated. 

5. Number of independent variables changed. 

6. Number of computer-adjusted prices. 

7. Number of times marl<et sales information was viewed, 

8. Number of baseline data obsen/ations of market in equiiibrium. 



Data Management Skills 
9. Totai number of notebook entries. 

10. Number of baseline data entries of market in equiiibrium. 

11. Entry of changed independent variables. 

12. Number of reinsertlons of changed independent variables to the online notebook. 

13. Number of "relevant" notebook entries divided by total number of notebook entries 
where "relevant" means variables specified in the Planning Menu. 

14. Number of times the table package was used "correctly" divided by the total 
number of times the table was used, where "correctly" means less than 6 variables 
tabulated, and sorting was done on variables with differing values. 

15. Number of times the graph package was used "correctly" divided by the totai 
nuir.her of times the graph was used, where "correctly" means plotting relevant 
variables, saving graphs, and superimposing graphs with a shared axis. 

16. Number of specific predictions made divided by the number of general hypotheses 
made. 

17. Number of correct hypotheses divided by the total number of hypotheses made. 
Thinking and Planning Skills 

18. Number of notebook entries of Planning Menu items. 

3,9 

29 



19. Number of times notebook entries of Planning Menu Items were made divided by 
the number of planning opportunities tlie subject had. 

20. Number of times variables were changed that had been specified beforehand in 
the Planning Menu. 

21. Number of times an experiment was replicated. 

22. Number of times a concept was generalized across unrelated goods. 

23. Number of times a concept was generalized across related goods. 

24. Number of times the student had sufficient data for a generalization (i.e., at least 
3 data points in the notebook before using the Hypothesis Menu). 

25. Number of times a change to an Independent variable was sufficiently large enough 
(i.e., greater than 10% of the possible range). 

26. Number of times one of the experimental frames was selected (i.e., chose "same 
good, change variable," "change good, same variables" or "change good, change 
variable'*). 

27. Number of times the Prediction Menu was used to specify a particular outcome 
to an event. 

28. Number of variables changed per experiment. 

29. Average number of actions per experiment. 

30. Number of economic concepts learned per session. 



ERLC 



40 

30 



